-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: remember the name of the lock chain (nftables) #2550
base: criu-dev
Are you sure you want to change the base?
net: remember the name of the lock chain (nftables) #2550
Conversation
c483710
to
0305093
Compare
Hi Adrian! Please, can you tell how and in which circumstances you've caught this issue? As far as I understand the idea of your fix is to ensure that we keep nftables table name in the inventory image file instead of dynamically recalculate it on restore (using A first question I have after going through this is "How this has worked before?". P.S. I'll take a closer look into this. I've spent not enough time yet to fully understand what's going on there. |
It probably never did. We are not running all the tests on a system without iptables with nftables locking backend. Only two or four tests are running with the nftables backend. |
I am trying to switch the default locking backend in Fedora and CentOS >= 10 to nftables from iptables because iptables is no longer installed by default.
Yes. The table name makes sense if the locking and unlocking happens in the same CRIU run, but between CRIU runs it does not work with the existing approach. |
Ah, thanks for clarifications! I wonder if we can do something like this:
Yes, it's not a forward-compatible change and will break restore of images which were dumped with an older CRIU. In this form only works for experimental purposes (and have to check for |
My idea is that instead of introducing a new field
|
@mihalicyn I am happy to use whatever makes most sense. What is
I don't think we have to worry about this. Currently it doesn't work at all. Let me know which ID makes most sense and I can rework this PR. I think the important part is that is has to come from some value of the checkpoint image and not be generated during restore. |
@mihalicyn I think I understood your proposal now. The PR could be really simple as pid_ns_id is already in the image. Let me try it out. |
With this line it also passes all the zdtm test cases (besides a couple of tests which call iptables (which I did not install)) if I switch to the nftables locking backend:
That brings it down to a one line change. Very good idea @mihalicyn. Thanks. How long can the pid_ns_id be? Currently the variable |
@mihalicyn Tests are happy, but So that is not really a good idea I think as it not really unique. |
Hey Adrian,
Yes, precisely.
We don't as we already have it in the image anyways.
Are we 100% percent sure that it doesn't work and never worked in any circumstances?
Hmm, it's
That's my bad, actually, to get pid namespace inode number you need something like:
But yes, I don't think that even with this change having pid_ns_id would be enough, I think we still need to add a new field to |
Also, we have |
Ah, okay. So let's use the
I don't know. All tests with open TCP connections are just hanging during restore because the network locking cannot be disabled. According to zdtm it is so broken that it doesn't work currently.
As an additional field in the nft table name? Or instead of |
Would it be possible to add a CI workflow or modify an existing one to run all tests with the nftables backend? |
Using libnftables the chain to lock the network is composed of ("CRIU-%d", real_pid). This leads to around 40 zdtm tests failing with errors like this: Error: No such file or directory; did you mean table 'CRIU-62' in family inet? delete table inet CRIU-86 The reason is that as soon as a process is running in a namespace the real PID can be anything and only the PID in the namespace is restored correctly. Relying on the real PID does not work for the chain name. Using the PID of the innermost namespace would lead to the chain be called 'CRIU-1' most of the time which is also not really unique. With this commit the change is now named using the already existing CRIU run ID. To be able to correctly restore the process and delete the locking table, the CRIU run id during checkpointing is now stored in the inventory as dump_criu_run_id. Signed-off-by: Adrian Reber <areber@redhat.com>
0305093
to
30e76fd
Compare
@mihalicyn What do you think about the latest version. This works in my tests just as good as the previous version. Now using criu_run_id as suggested. |
Hi Adrian!
Looks great! The only thing that worries me is that idea behind Also, I tried to play with nftables-based locking mechanism on my machine, and found, that for
While for |
Just for the sake of demonstrating my point. Something like this:
fixes tests for |
Right, because
You are right. It works for
No real opinion here, but it might be good. No idea.
We can just deprecate the protobuf field at some point in the future and use a new one if we feel that is necessary. |
@@ -229,6 +229,8 @@ static const char *unix_conf_entries[] = { | |||
"max_dgram_qlen", | |||
}; | |||
|
|||
extern char nft_lock_table[32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, we don't need this anymore.
* information is needed to identify the name of the network | ||
* locking table. | ||
*/ | ||
dump_criu_run_id = he->dump_criu_run_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (he->has_dump_criu_run_id) {
dump_criu_run_id = he->dump_criu_run_id;
}
{ | ||
if (snprintf(table, n, "inet CRIU-%d", root_item->pid->real) < 0) { | ||
if (snprintf(table, n, "inet CRIU-%" PRIx64, id) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*
* Keep compatibility with images
* without he->dump_criu_run_id field.
*/
if (!id) {
if (!(root_ns_mask & CLONE_NEWPID)) {
id = root_item->pid->real;
} else {
pr_err("Cannot generate CRIU's nftables table name because of issue #2550\n");
return -1;
}
}
if (snprintf(table, n, "inet CRIU-%" PRIx64, id) < 0) {
What do you think about this?
I agree.
Yeah, I agree. I just wanted to ensure that I understood problem right and this code example is the best way to show different scenarios we have and when it works and when it doesn't. But we still need some extra checks for compatibility reasons, IMHO. In general, this PR looks great to me. Thanks for working on this, Adrian! |
|
||
if (nftables_get_table(table, sizeof(table))) | ||
if (dump_criu_run_id == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to introduce a boolean parameter here to determine if we are on restore/dump codepath as dump_criu_run_id == 0
might be in two different cases: when we deal with an old image (without dump_criu_run_id
field) or if we are on the restore codepath.
It was broken by 9838d34. |
Using libnftables the chain to lock the network is composed of ("CRIU-%d", real_pid). This leads to around 40 zdtm tests failing with errors like this:
The reason is that as soon as a process is running in a namespace the real PID can be anything and only the PID in the namespace is restored correctly. Relying on the real PID does not work for the chain name.
Using the PID of the innermost namespace would lead to the chain be called 'CRIU-1' most of the time which is also not really unique.
The uniqueness of the name was always problematic. With this change all tests are working again which rely on network locking if the nftables backend is used for network locking.